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Abstract 


This  report  is  concerned  with  developing  and  implementing  scalable  algorithmic 
approaches  for  solving  coupled  micro  and  macro  structural  applications  based  on  the 
asymptotic  expansion  homogenization  (AEH)  method.  AEH  for  nonlinear  and  short 
transient  loading  applications  is  computationally  demanding  and  cannot  be  addressed  on 
serial  computers.  Hence,  the  emphasis  of  the  present  investigation  is  to  develop  and 
implement  consistent  AEH  numerical  formulations  on  scalable  computers  to  address 
elasto-plastic  material  response  of  structures  subjected  to  short  transient  loading.  A 
second  order,  accurate  velocity-based,  explicit  time  integration  method,  in  conjunction 
with  the  AEH  method  on  scalable  computing  architectures,  is  implemented  using 
message-passing  interface  (MPI).  Two  different  scalable  implementation  approaches 
are  discussed,  and  the  scalability  of  the  computational  approach  is  demonstrated. 
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1.  Introduction 


Many  practical  military  structural  analysis  problems  are  concerned  with  heterogeneous  media 
subjected  to  short  impact  or  transient  loading.  For  these  heterogeneous  media  applications,  the 
challenges  include  accurately  assessing  various  failure  modes  at  the  micro-mechanics  level, 
expressing  constitutive  model  with  effective  material  properties  at  the  macro-mechanics  level, 
and  understanding  the  relationship  between  micro  and  macro  levels.  The  effective  material 
properties  can  be  obtained  either  by  using  mathematical  homogenization  theory  [1-5]  or  its 
engineering  coimterpart  [6-10].  For  short  transient  impact  loading  condition,  a  detailed 
knowledge  of  the  material  flow  in  and  aroimd  material  micro  structural  constituents,  and  material 
interfaces  is  necessary.  This  flow  is  three-dimensional,  transient,  nonlinear,  and  the  controlling 
global  loads  are  of  short  duration  impact.  Classical  theories  such  as  rule  of  mixtures  that  are 
based  on  constant  stress  or  strain  assumption  leads  to  an  aggregation  of  the  response  in  the 
micro-structural  details  and  thus  are  of  limited  use. 

Recent  advances  in  mathematically  rigorous  asymptotic  expansion  homogenization  (AEH) 
methods  enable  the  coupling  of  micro  and  macro  approaches  for  both  linear  and  nonlinear 
stmctural  applications  [11-15].  Most  recently,  Chung  et  al.  [16]  demonstrated  the  applicability 
of  AEH  method  for  heterogeneous  media  subjected  to  short  transient  loading  by  employing 
explicit  dynamics  finite  element  formulations  in  conjunction  with  elasto-plastic  material 
response.  Using  the  AEH  approach,  the  micro-level  variables  were  expressed  as  direct  functions 
of  the  macro-level  variables  m  a  strict,  mathematically  seamless  approach.  Their  work  [16]  was 
especially  suitable  for  conducting  explicit  transient  elasto-plastic  analysis  of  heterogeneous 
materials.  An  updated  Lagrangian  scheme  for  small  strains  and  small  rotations  is  employed  to 
account  for  large  displacements,  strains,  and  rotations  over  many  time  steps.  At  each  time, 
the  equilibrium  solution  from  the  previous  time  step  is  used  to  update  the  position  of  the  nodal 
coordinates  such  that  the  next  simulation  computes  its  solution  based  on  the  new  equilibrium 
configuration.  Updating  the  mesh  at  every  solution  interval  allows  for  the  subsequent  interval 
to  employ  the  last  equilibrium  state  as  its  reference  state.  Since  an  explicit  time  integration 
scheme  takes  quite  a  large  number  of  time  steps  and  the  macro-  and  micro-level 
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computations  are  to  be  performed  at  each  time  step  and  for  each  global  finite  element,  the 
attendant  micro  and  macro  computations  can  take  much  longer  times  for  solution.  Scalable 
computational  approaches  are  needed  to  explore  these  computationally  intensive  methods. 

General  purpose  explicit  dynamics  codes  like  PARADYN  [17]  and  PR0NT03D 
[18]  demonstrated  large  scale  analyses  on  a  variety  of  scalable  computing  platforms  using 
message-passing  interface  (MPI)  libraries  and  domain  decomposition.  Macro-mechanics 
implementation  of  the  proposed  method  on  scalable  computers  is  similar  to  these  general 
purpose  explicit  dynamics  codes.  The  investigation  of  Chung  et  al.  [16]  considers  the  dynamic 
equation  of  motion  discretized  according  to  a  second  order  accurate  scheme  as  presented  in 
Namburu  [19].  In  addition  to  macro-mechanics  explicit  formulations,  the  proposed  AEH 
formulation  requires  coupling,  and  solving  the  local  micro-structural  response  in  terms  of  the 
global  macro  stractural  response.  One  strategy  that  was  explored  extensively  was  based  on  using 
a  global  domain  decomposition  scheme  for  spreading  the  global  finite  element  computations 
across  a  grid  of  processing  nodes,  and  then  using  a  separate  set  of  processors  for  the  micro¬ 
element  computations.  In  a  later  extension  of  this  approach,  the  global  processors  that  are 
finished  with  their  part  of  computations  ahead  of  other  processors  were  put  to  use  in  helping  with 
the  micro-element  computations  in  other  domains.  Scalability  of  the  approach  is  demonstrated 
for  a  Taylor  impact  problem.  The  results  obtained  on  an  IBM-SP2  showed  consistent  scaling 
with  2, 4,  8, 16, 32,  and  64  processor  computations. 

The  outline  of  this  report  is  as  follows.  Section  2  introduces  the  fimdamental  ideas  of  the 
mathematical  and  numerical  formulations  of  the  asymptotic  expansion  homogenization  method. 
Section  3  discusses  the  algorithmic  issues  and  implementation  of  the  AEH  method  on  scalable 
computers.  ScalabiUty  of  the  approach  is  discussed  in  section  4. 

2.  Mathematical  Formulations 


2.1  Governing  Equations.  Assmne  a  three-dimensional  body  is  an  assembly  of  periodic 
stractures  containing  different  material  as  shown  in  Figure  1.  Typically,  the  imit  cell  is  very 
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Figure  1.  Macro-Micro  Analyses’  Neighborhoods. 


small^  of  order  s  (where  ^  is  a  small  positive  number)  compared  to  the  dimensions  of  the  problem 
domain.  Structural  response  quantities  such  as  displacements,  velocities,  stresses,  and  strains  are 
assumed  to  have  slow  variations  (macroscopic)  jfrom  point  to  point  as  well  as  fast  (microscopic) 
variations  within  a  small  neighborhood  f  of  a  given  point  x.  Let  F  be  a  (periodic)  representative 
part  of  Ci.  Here  we  distinguish  two  scales:  the  macroscopic  scale  (xefl)  and  microscopic  scale 
(ye  H).  Lagrangian  representation  of  conservation  of  mass  and  momentum  equations  are  used  in 
this  study.  The  conservation  of  mass  is  used  to  calculate  the  current  density  from  the  initial 
density.  The  momentum  equation,  the  kinematic  relations,  and  the  constitutive  model  are 
represented  by 


(1) 

(2) 
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and 


<Tl=S,('ei,E), 


(3) 


where  p  is  the  density,  is  the  body  force  vector,  E  is  the  internal  energy,  the  superscript  s 
denotes  micro/macro  continuum  solutions,  v,  is  the  velocity  vector,  is  the  stress  tensor,  and 
Cy  is  the  strain  tensor.  The  initial  conditions  in  the  domain  E2  ^ ,  boundary  conditions  on  the 
surface  of  the  domain  F  (  F  are  given  by 


and 


ii 

0)  =  <» 

(4) 

V,*((  = 

0)  =  v;, 

(5) 

on  57], 

(6) 

II 

on  dF2, 

(7) 

where  m,-  is  the  displacement  vector  and  «,•  is  a  surface  normal. 

2.2  Time  Integration.  Second  order  accurate  Lax- Wendroff  based  explicit  time  integration 
procedure  [20]  is  employed  for  the  conservation  equation  or  equation  of  motion.  In  this 
approach,  the  dependent  variable  velocity  is  first  discretized  in  time  using  second  order  Taylor 
series  expansion. 


=  v;  +Zirvf 


(8) 


Taking  appropriate  time  derivatives  of  equation  (1)  and  substituting  for  the  velocity  and 
acceleration  gives 
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(9) 


P^i  -  P^i  -’^jij  +Pfi 

The  stress  increment  is  related  to  a  strain  increment  through  an  appropriate  constitutive  equation. 
The  elasto-plastic  constitutive  equation  at  midpoint  time  increment  is  defined  by 


erf .  =M.i,((7- . *  +C„  s‘^"  ) 


(10) 


Stresses  and  strain  rates  at  midpoint  time  can  be  evaluated  using  the  predicted  displacements  as 
shown: 


«+l/2  n  At  -n+l/2 

U,  =U:  +  - V. 

'  2 


(11) 


2.3  Asymptotic  Expansion  and  Elasto-plastic  Constitutive  Equations.  The 

homogenization  method  is  based  on  the  asymptotic  expansion  of  the  primary  variables  together 
with  a  unit  cell  approach  for  a  heterogeneous  structure.  Assume  a  three-dimensional  body  Q  is 
an  assembly  of  periodic  structures  (1).  Typically,  the  unit  cell  is  very  small,  of  order  f  (^  is  a 
small  positive  number)  compared  to  the  dimensions  of  the  problem  domain.  Typically,  two- 
scale  asymptotic  expansion  can  be  employed  to  approximate  the  displacement,  velocity,  strain, 
or  stress  fields. 


{X,y)  =  (x, y)  +  (x, y)  +  (x,y)  +  •  •  • , 

(12) 

vf  {x,  y)  =  (x,  y)  +  (x,  y)  +  (x,  y)  +  •  •  • , 

(13) 

<  (^>T)  =  (x,  y)  +  £<2,.  (x,  y)  +  (x,y)  +  •  •  • , 

(14) 

o-f  (^,T)  =  o-o,  {x,  y)  +  £<7,.  (x,y)  +  £  V2_  (x,y)  +  •  •  • , 

(15) 
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where  u‘^{x,y),  vf{x,y),  ef(x,y),  and  a^(x,y)  are  Y-periodic  functions. 


The  AEH  approach  for  nonlinear  applications  is  based  on  the  instantaneously  linearized 
assumption  for  the  constitutive  model,  Terada  and  Kikuchi  [12],  The  fundamental  assumption  of 
the  AEH  approach,  therefore,  is  that  the  true  solution  in  the  s  space  is  decomposed  into  a  macro 
space  X  and  a  micro  space  y.  The  basic  assumption  here  is  that  multiple  scales  exist  only  in  the 
spatial  variables  and  that  no  such  scaling  exists  for  the  time  variable.  Further  reading  in  space 
and  time  asymptotic  expansion  approaches  can  be  found  in  Benoussan  et  al.  [4].  The  general 
approach  to  heterogeneous  problems  is  to  separate  and  draw  clear  distinction  between  the  micro- 
and  macro-level  equilibrium  equations  regardless,  per  the  earlier  linearized  assumption,  of 
material  nonlinearity.  This  is  accomplished  by  asymptotically  expanding  the  primary  variables 
where  the  asymptotic  scale  is  approximated  to  the  second  order. 


vf  (X, y)  =  v^,  {x, y)  +  (x, y)  + s\ix,y)  +  --- .  (1 6) 

Spatial  gradients  in  £--space  are  taken  with  respect  to  the  x-coordinate  system.  The 
corresponding  gradient  for  Y-periodic  functions  (where  micro  and  macro  are  now 
distinguishable)  is  given  by  the  chain  rule  where  the  scaling  is  defined  by  y  =  x/s,  and  hence  for 
any  Y-periodic  function  (f),  the  earlier  gradients  are  replaced  by 


d(t)  ^  d<l>  ^  \  d<l) 

dX;  dX;  S 


Then,  using  equation  (1)  in  equation  (7)  while  considering  equation  (2)  gives  the  rate  of  strain 
tensor  defined  by 


^  =  [e,y  (V, )  +  el  (v, )]  -h  (v, )  +  el  (v^ )]  +  • 


(18) 


where  the  symmetrized  gradient  tensors,  and  e^y  are  defined  by 
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(19) 


and 


^dxj  dx- 


dyj 


^ 


(20) 


The  relevant  expressions  for  gradients  and  velocities  are  first  substituted  into  the  governing 
equations  of  motion.  Next,  the  micro  and  macro  equations  are  identified  by  selecting  the 
appropriate  coefficients  to  the  scaling  factors  f  which  must  each  be  identically  zero.  Substituting 
the  asymptotically  expanded  velocities,  equation  (1),  in  the  strain  rate,  equation  (3),  and  then 
using  the  constitutive  equation,  equation  (16),  in  the  equations  of  motion,  equation  (15),  yields  a 
set  of  equations  dependent  on  powers  of  s.  To  satisfy  the  equations  of  motion,  each  term 
associated  with  each  of  the  powers  of  s  must  approach  zero  identically.  This  leads  to  a  set  of 
equations  associated  with  the  microscopic  and  macroscopic  equations  of  motion.  The  first 
equation,  associated  with  the  powers  of  e~^  is  given  by 


dy, 


=  0, 


(21) 


where  Equation  (21)  states  that  v”'‘  is  a  function  only  of x.  Hence,  derivatives 

ofy  are  zero.  Using  this  inference,  the  equation  associated  with  the  powers  of  £~^  is  derived  as 


-C  _ ^  + 


n-i 


dyj  ax, 


dyj  dy, 


1  a 

Atdyj 


(22) 


Equation  (22)  relates  the  perturbative  velocity  field,  v„  to  the  macroscopic  velocity  field,  Vq. 
An  equation  relating  these  two  quantities  become  the  important  micro-macro  equation,  providing 
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the  direct  link  to  relate  microscopic  to  the  macroscopic  velocities.  Such  an  equation  is  also 
called  the  localization  equation  due  to  its  features.  The  localization  equation  is  given  by 


vr’  = 


dv. 


»-i 


=  -;ir/ 


(23) 


where  v,^  is  a  constant  of  integration,  and  the  corrector  functions  (sometimes  referred  to  as  the 
so-called  characteristic  functions)  are  the  solutions  to  the  equations 


mn 

k 


dyj 


— c 

ayj 


and 


-Cjju 


dzl 


dyj 


(24) 


(25) 


Equations  (24)  and  (25)  are  the  corrections  which  account  for  the  shape  of  the  interface 
separating  various  phases  and  the  time-dependent  plastic  softening  effect  due  to  material 
nonlinearities,  respectively.  Clearly,  for  a  homogeneous  problem,  both  correctors  are  identically 
zero.  In  the  event  where  no  plastic  yielding  occurs,  the  problem  degenerates  to  a  transient  elastic 
problem.  These  equations  arising  fi-om  equation  (22)  constitute  the  microscale  problem. 

Finally,  the  equation  associated  with  the  powers  of  is  written  as 


n-1 


5 

dx, 


dx,^ 


n-\ 


n-\ 


^ijkl 


'7 

P 


dy^  dyj  dx^  dyj  dy^ 


At^ 


Av^^^  — 


Atdxj^^ 


(26) 


Af 
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Much  like  equation  (22)  provided  the  basis  for  the  micro-scale  problem,  equation  (26)  provides 
the  basis  for  the  macro-scale  problem.  By  taking  the  volume  average  of  equation  (26), 
substituting  for  v,  using  the  localization  equation  (23)  and  noting  the  periodicity,  the  final 
governing  equation  of  motion  is  given  by 


{p)A- 


n+l 


=  At- 


dXj 


»+l/2 


where  the  “corrector  stress,”  ,  is  defined  by 

^  dy,  dyi  dx„  J 

and  where  quantities  in  brackets  denote 


(27) 


(28) 


(29) 


It  is  of  interest  to  note  that  equation  (27),  for  heterogeneous  conditions,  is  similar  in  form  to 
equation  (6),  the  homogeneous  equation,  making  the  integration  of  a  micromechanical  problem 
into  a  macro  analysis  tractable  and  straightforward. 

2.4  Constitutive  Equation:  Elasto-plastic.  The  stress  tensor  is  split  into  two  parts.  The 
first  part,  S,)-,  is  deviatoric  stress,  which  is  related  to  material  strength,  and  the  second  part  is 
pressure  P.  To  simplify  notation,  the  superscript  ^■has  been  omitted. 

+  (30) 
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and 


(31) 

The  first  term  in  equation  (30)  accounts  for  volumetric  changes  and  is  typically  evaluated 
fi'om  equations  of  state  in  explicit  dynamics.  The  second  term,  the  stress  deviator,  is  related  to 
the  deformation  of  the  material  and  is  defined  by  a  constitutive  model,  and  in  particular,  the  time 
rate  of  change  of  the  deviator  is  evaluated  fi-om  the  strain  rates. 

S,j=2GSy+A^j,  (32) 

where  G  is  the  shear  modulus  and  J,y  is  the  correction  for  rigid  body  rotation. 

An  isotropic  hardening  model  with  a  rate-dependent  Von  Mises  yield  condition  is  employed 
in  the  present  analysis.  A  consistency  condition  ensures  that  the  stress  state  remains  on  the  yield 
surface  at  the  start  and  end  of  a  time-step.  Such  a  condition  is  given  by 

(33) 

The  variable  R  is  the  rate-dependent  radius  of  the  yield  surface  or  the  apparent  yield  stress,  Qy  is 
the  vector  specifying  the  normal  direction  to  the  yield  surface,  and  Sy  is  the  deviatoric 

component  of  the  stress.  The  stresses  are  imderstood  to  be  the  co-rotated  second-order  tensor 
according  to  the  Jaumaim  definition  of  the  co-rotational  derivative.  The  Jaumann  derivative  of 
the  Cauchy  stress,  &y  is  related  to  the  material  time  derivative,  &y  by 

&fj  =  &y  +  +  co,,a,j ,  (34) 

where  is  the  rotation  tensor,  the  standard  skew-symmetric  component  of  the  velocity 
gradient.  The  radius  of  the  yield  surface  is  defined  by 
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1  + 


^1} 

yDj 


+  H’£ 


>oP 


(35) 


where  (jy  is  the  static  yield  stress  of  the  material  and  D  and  p  are  the  so-called  fluidity 
parameters.  H'  is  the  hardening  parameter,  and  J  ^  is  the  effective  plastic  strain.  The  effective 
rate  of  deformation,  e  is  defined  by 


e  = 


(36) 


where  is  the  deviatoric  component  of  the  total  strain  Sy.  The  normality  condition,  specified  by 
the  normal  vector  Qip  is  given  by 


S„ 


e 

mn^  mn 


(37) 


For  a  homogeneous  material  (which  is  applicable  in  the  present  derivation  since  the  constitutive 
equations  are  applied  at  the  micro  level  where  the  material  is  homogeneous  within  each  phase) 
the  incremental  relationships  for  the  apparent  yield  stress  and  the  deviatoric  stress  are  defined  as 

+-H'AX,  (38) 

3 

and 

=Sf  -2GMQ^,  (39) 


11 


where  iS',y'^'  is  the  deviatoric  trial  stress,  AA  is  a  scalar  quantity  representing  the  magnitude  of 

the  radial  return  correction  due  to  plastic  yielding,  and  G  is  the  elastic  shear  modulus.  The 
deviatoric  trial  stress  is  defined  by 


sf  ,  (40) 

where  Cijki  is  the  tensor  containing  elastic  properties. 

Finally,  substituting  equations  (41)  and  (39)  into  equation  (33)  and  solving  for  the  radial  return 
correction  parameter  gives 


AA  =  - 


3/2 


(3G  +  H') 


R - 

\ 

rin+l^  on+1^ 
2^'^  {/ 

\ 

n 

1  " 

J 

(41) 


Using  equation  (40)  in  equation  (41)  and  employing  the  standard  assumption  that  a  material  is 
elastic  in  its  dilatational  behavior  and  plastic  only  in  shear  gives  the  total  stress  increment  in 
equation  (16).  The  constitutive  equation  (16)  can  now  be  employed  in  the  derivation  of  the 
micro  and  macro  governing  equations. 


3.  Computational  and  Implementation  Issues 


3.1  Macro  Finite  Element  Equations.  Finally,  in  the  absence  of  damping,  the  discretized 
equations  are  given  in  matrix  form  by 

,  (42) 

where  the  displacements  are  given  by 
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uf^u'+Atli-ry-^r^r] 


(43) 


and 


M=lN,pN,dn,, 

(44) 

(45) 

(46) 

=  At  f  , 

(47) 

Jv"^*=v''^’-v",and 

(48) 

(49) 

The  subscript  e  denotes  element  quantities,  and  is  a  free  parameter.  The  volxime  averaging 
brackets  are  implied  for  densities  and  stresses.  The  stresses  are  defined  by 


#n+l/2 


—  n+1/2  _  c 

(J..  —  (T., 


(50) 


where  is  the  stress  corrector.  The  mass  matrix  M  in  equation  (44)  is  lumped  for  the  present 

explicit  formulation.  The  vectors  F\,  F2,  and  F3  are  due  to  internal  forces,  boundary  tractions, 
and  external  loads,  respectively. 
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3.2  Micro  Finite  Element  Equations.  In  order  to  compute  each  term  in  equation  (44),  the 
effective  microscopic  quantities  for  density  and  stresses  must  first  be  homogenized  over  the 
representative  unit  cell  associated  with  that  macro-level  finite  element.  Despite  the  explicit 
formulation  in  the  macro-level  which  avoids  the  solution  of  a  set  of  equations,  the  micro-level 
equations  require  an  iterative  implicit  solution  of  a  sparse  equation  set.  This  is  less  computation 
than  required  in  the  quasi-static  approach  which  requires  solutions  of  both  the  micro-level 
equations  and  the  macro-level  equations.  The  present  method  avoids  the  solution  of  macro-level 
equations. 

The  discretized  forms  of  the  micro-level  equations  are  given  by 

Kx^=Fr\  (51) 

and 

(52) 

where 


^=1 

(53) 

^  micro 

(54) 

rp  micro  f 

=  J 

(55) 

Equation  (51)  possesses  six  right-hand  side  vectors  corresponding  with  the  six  columns  in 
[C*]  while  equation  (52)  possesses  one  right-hand  side.  Thus,  for  each  macro-level  element, 
seven  sets  of  equations  must  be  solved.  Since  the  stiffiiess  matrix  is  the  same  in  each  equation,  a 
solver  which  can  re-employ  its  factored  stiffiiess  matrix  is  warranted.  The  micro-level  stresses 
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are  computed  from  equation  (16),  implying  that  every  micro-level  element  stress  must  be  stored 
between  successive  time  steps,  which  are  then  averaged  to  give  the  macro-level  stresses  via  the 
volume  averaging  operator. 

For  true  generality,  the  nonlinear  equations  (51)  and  (52)  must  be  solved  for  each  macro¬ 
level  element  because  of  micro-level  stress  variations  which  may  occur.  In  light  of  the  added 
computation  required  in  multiscale  analyses,  the  computational  complexity,  a  subjective  measure 
of  the  computation  time,  increases  substantially  for  a  micro  or  macro  problem  compared  to  a 
simple  single-length  scale  macro  problem. 

3.3  Scalable  Implementation.  This  section  discusses  scalable  implementation  aspects  of 
explicit  macro  finite  element  formulations  and  implicit  micro  finite  element  formulations.  To 
enhance  the  scalable  speed-up,  two  different  micro-element  implementation  strategies  were 
discussed. 


33.1  Scalable  Approach  for  Macro  Finite  Element  Method.  The  macro  approach  uses  the 
self-starting  velocity  based  explicit  time  integration  algorithm  of  reference  [20]  in  conjunction 
with  the  elasto-plastic  constitutive  relations  and  large  deformation  as  shown  in  Figure  2. 

Note  that  the  explicit  time  integration  scheme  does  not  involve  any  system  of  equations  to 
be  solved.  Similar  to  other  large  deformation,  explicit  time  integration  codes,  the  present 
approach  uses  one-point  integration  to  evaluate  the  element  integrals.  First,  the  finite  element 
mesh  is  partitioned  into  subdomains  using  the  domain  decomposition  approach  based  graph 
theory  and  in  particular  METIS  software  of  reference  [21].  This  approach  gives  optimum  mesh 
partitioning  based  on  edge  cuts.  The  number  of  subdomains  partitioned  is  equal  to  the  number  of 
processors  being  used.  Next,  the  MPI  is  used  to  communicate  information  between  the 
processors.  Figure  3  depicts  the  process. 
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Given  the  following  equations  of  motion, 
subjected  to  the  following  boundary  conditions, 

and  the  initial  conditions. 


PU  +  aijj  =  fe , 

Uj  =  Ui  on  Fj,  and 
aij  =  Tion  Fij, 

Ui  (t  =  0)  =  u°i  and 
Ui(t  =  0)=uf, 


with  p  -  density,  ii  -  acceleration,  Cij  =  internal  forces,  and  fe  =  external  forces;  then  the 
numerical  algorithm  begins  with  an  assumption  for  the  time  integration. 

An  example  of  a  scheme  based  on  the  central  difference  is  given  below.  The  solution 
begins  with  the  initial  conditions  for  the  displacements  and  velocities  at  time  stqp  n  =  0.  It 
progresses  in  the  following  substeps  for  each  Atn: 


Step  1:  Predictor  step: 

Step  2:  Velocity  increment  solution  step: 


+— if 
2 

MAU"^^  =F  +F  +F 


Step  3:  Corrector  step:  u”'"' =if +— rCi”^* +u"l 

where  the  element  integrals  in  the  above  2  ^ 

are  given  as  . 

M=  ^_Nj»N,da. . 

F,=  , 

1  =A4.N,pf"*'”dn,,aiid 

Step  4:  Continue  the  corrector  step  to  update  the 
velocities,  displacements,  coordinates, 
time,  and  stresses:  .  "4  +u"  At" , 

Au""^^  =u"'^^  -u" , 

1  ' 

=x"  ^ 

^  n4  Ar+At""' 

A I  ^  = - - - ,  and 

2 

.n*1/2  _  _n+1/2  _c 
^ij  ^ij  • 

Step  5:  Check  At  for  Courant  stability  criteria  and  continue  Steps  1  through  4  until 
_ specified  time  limit. _ 

Figure  2.  A  typical  Algorithm  for  the  Explicit  Integration  of  Equations  of  Motion. 


1  1 
.  n+-  n — 
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Global  Mesh 


Domain 

Decomposition 


Domain 
processors 
compute  both 
global  and 
micro  elements 


Figure  3.  Strategy  1  Showing  Domain  Processors  Computing  Both  Micro  and  Global 
Elements. 

3.3.2  Scalable  Approaches  for  Micro  Finite  Element  Method.  As  opposed  to  the  macro 
finite  element  method  which  uses  the  explicit  time  integration  approach,  the  micro  finite  element 
approach  involves  solving  systems  of  equations.  Each  macro  element  carries  with  it  a  micio- 
element  model  representing  its  micro-structural  detail.  In  a  traditional  finite  element,  the  material 
constitutive  behavior  is  sampled  at  its  integration  points.  In  a  similar  manner,  the  constitutive 
behavior  of  a  macro-element  in  the  current  program  is  obtained  by  solving  the  respective  micro¬ 
element  equations,  at  each  time  step.  The  micro-element  solutions  include  the  micro-element 
stresses,  accumulated  plastic  strain,  and  the  elastic  and  plastic  material  properties.  Since  AEH 
theory  assures  that  these  equations  can  be  expressed  solely  in  terms  of  the  macro  element 
response  variables,  the  micro-element  model  computations  are  independent  across  the  macro 
elements  and  across  the  different  decomposition  domains.  That  is,  different  strategies  can  be 
used  to  implement  micro-element  computations  on  scalable  computers.  In  this  study,  two 
strategies  are  explored,  and  the  issues  of  implementation  and  efficiency  of  the  implementation 
are  discussed  next. 
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The  two  parallel  implementation  strategies  are  discussed. 


(1)  Domain  decomposition  for  macro  elements  with  the  domain  processors  also  doing  the 
computations  of  their  respective  micro-structural  models  (Figure  3). 

(2)  Domain  decomposition  for  macro  elements  with  each  domain  processor  assigned  with 
helper  processors  for  doing  micro-structural  model  computations  (Figure  4). 


Global  Mesh 


Domain 

Decomposition 


Domain  and 

helper 

processors 


Figure  4.  Strategy  2  Showing  Helper  Processors  Computing  Only  Micro-elements  and 
Global  Processors  Computing  Both  Global  and  Micro-elements. 


Figures  5  and  6  present  the  program  flow  for  these  two  strategies.  In  the  first  strategy,  the 
computations  begin  with  the  gathering  of  the  following  information:  the  macro-element 
coordinates,  the  micro-element  coordinates,  mechanical  properties,  plasticity  properties,  average 
macro-element  stresses,  detailed  micro-element  stresses,  average  macro-element  plastic  strains, 
detailed  micro-element  plastic  strains,  and  the  current  velocity  increments  at  the  macro-element 
nodes. 
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Figure  6.  Program  Flow  for  Strategy  2  With  Helper  Processors. 


Computations  in  each  micro-element  involve  updating  the  micro  stresses,  micro  plastic 
strains,  micro  strain  rates,  and  the  micro  mechanical  and  plastic  properties.  As  shown  in  Figure 
6,  these  computations  are  carried  with  the  aid  of  helper  processors  in  the  second  strategy.  After 
this,  the  corresponding  quantities  are  aggregated  at  the  macro  element  level  and  mass  and 
internal  force  vectors  are  computed  for  each  macro  element.  In  this  manner  after  each  macro 
element  and  its  micro-elements  are  computed,  the  mass  and  internal  force  vectors  are  assembled 
across  the  macro  elements  in  all  the  domains.  After  all  the  elements  are  computed,  the  mass  and 
internal  force  information  is  exchanged  for  the  nodes  that  are  shared  across  the  domain 
boundaries.  The  fully  assembled  mass  and  internal  force  vectors  are  later  used  for  obtaining  a 
new  set  of  accelerations.  After  updating  the  velocities  and  geometry  (and  after  checking  for 
possible  contacts  and  failures),  the  next  time  step  is  commenced.  In  each  time  st^,  the  domain 
processor  does,  these  computations  in  serial  maimer  first  by  calling  the  macro  elements  and  then 
the  micro-elements  resident  in  each  macro  element. 

The  number  of  helper  processors  (uh)  that  are  needed,  in  the  second  strategy  above,  for  each 
domain  depends  on  several  factors  including  the  micro-structural  model  size,  finite  element 
interpolations,  constitutive  behavior  of  the  material  micro-components,  and  the  complexity  of  the 
homogenization  theory  that  is  being  used.  Along  with  size  and  shape,  the  number  of  micro¬ 
elements  determines  the  complexity.  Thus,  if  one  limits  the  type  of  micro-structures  in  a  domain 
to  just  one  kind,  then  the  number  of  micro-elements  (nm)  appearing  in  each  macro-element  would 
be  constant;  and  then  one  can  begin  with  a  number  of  helper  processors  that  equals  to  this 
number  (i.e.,  nh  =  Um). 


If  Uh  is  set  equal  to  Um,  however,  the  parent  domain  processor  becomes  idle  while  the  micro¬ 
element  computations  are  going  on.  To  avoid  this,  nh  is  set  equal  to  Un,  -  1,  forcing  the  domain 
processor  participate  in  doing  one  of  the  micro-element  computations. 
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4.  Scalability  Studies 


The  results  from  the  first  strategy  (i.e.,  with  domain  decomposition  and  no  helper  processors 
for  micro-element  computation)  are  presented  in  Table  1. 


Table  1.  Results  for  the  Taylor  Problem  Without  Helper  Processors 


Column  1 

Column  2 

Column  3 

Column  4 

Column  5 

No.  of 
Processors 

Avg.  Computation 
Time  per  Domain 
Processors,  A-dc 
(s) 

Avg.  Commimication 
Time  Between  Domain 
Processors,  A,dd 
(s) 

Avg.  Wait  Time 
per  Domain 
Processors, 

(s) 

Run  Time, 

(s) 

1 

40,474 

— 

40,474 

2 

18,470 

6 

2,445 

20,921 

4 

8,526 

6 

2,397 

10,929 

8 

3,922 

5 

1,774 

5,701 

16 

1,860 

5 

1,077 

2,942 

32 

943 

7 

565 

1,515 

64 

507 

14 

336 

857 

By  the  results  in  columns  2  and  5  in  Table  1,  one  can  conclude  that  the  computation  and 
overall  run  times  are  fairly  scaled  with  the  number  of  processors.  This  was  achieved  despite  the 
fair  amount  of  wait  time  that  the  processors  had  to  endure.  This  wait  time  occurred  as  a  result  of 
having  more  elements  in  some  processors  becoming  plastic  and  thus  needing  more  micro¬ 
element  computations  than  others. 

This  wait  time  can  be  reduced  significantly  by  moving  such  elements  evenly  among  all  the 
processors.  This  was  a  result  of  the  nature  of  the  domain  decomposition  used  on  the  input  data 
set.  With  a  prior  knowledge  of  such  deformation  an  even  distribution  of  the  plastic  elements  can 
be  achieved.  Even  though  such  knowledge  is  available  in  the  present  Taylor  impact  problem, 
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elements  were  not  redistributed  because  the  results  for  the  run  time  scaled  fairly.  In  the  present 
problem,  there  was  an  impact  in  the  longitudinal  direction  causing  a  radial  distribution  of  the 
plastic  elements.  In  spite  of  this,  a  longitudinal  decomposition  was  employed  which  meant  that 
domains  away  from  the  impact  had  to  wait  until  their  elements  become  plastic. 

The  times  of  communication  among  the  domain  processors,  as  evident  from  the  results  in 
column  3  in  Table  1,  were  fairly  small.  The  results  for  the  same  problem  with  the  second 
strategy  of  computation  are  presented  in  Table  2.  In  this  strategy,  helper  processors  were 
provided  to  help  in  the  micro-element  computations.  Since  the  example  problem  had 
microstmctures  in  all  the  global  or  macro  elements,  since  each  microstructure  had  two  elements, 
and  since  the  domain  processors  are  also  used  in  all  the  micro-element  computations,  one  helper 
processor  each  was  provided  for  each  global  domain  processor.  The  results  are  presented  in 
Table  2. 


Table  2.  Results  for  the  Taylor  Problem  With  Helper  Processors 


Column 

Column 

Column 

Column 

Column 

Column 

Column 

Column 

Column 

Column 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

No.  of 

No.  of 

domain 

helper 

processors 

processors 

^dd 

A^dw 

A.<jh-d 

A'dc 

A-dh-h 

^hc 

A-hw 

A-2 

(S) 

(S) 

(s) 

(s) 

(s) 

(s) 

(s) 

(S) 

1 

40,474 

2 

2 

7 

2,004 

560 

13,359 

6,526 

5,540 

3,864 

15,930 

4 

4 

6 

1,975 

268 

6,103 

4,005 

2,630 

1,717 

8,352 

8 

8 

5 

1,458 

125 

2,797 

2,377 

1,230 

778 

4,385 

16 

16 

5 

875 

59 

r  1,332 

1,312 

579 

380 

2,271 

32 

32 

9 

462 

30 

690 

^687 

290 

214 

1,191 

Note:  A-di  =  Average  time  per  domain  processor  for  communication  with  domain  processors. 

A-dw  =  Average  time  per  domain  processor  waiting  for  end  of  corrqjutations  in  other  domains. 
A.dh.d  “  Average  time  per  domain  processor  for  commimication  with  helper  processors. 

A,dc  =  Average  time  per  domain  processor  for  computation,  =  A,2  -  A.dd  -  ^  -  A.dh-d- 
A-dh-h  =  Average  time  per  helper  processor  for  communication  with  domain  processors. 

A-hc  =  Average  time  per  helper  processor  for  doing  micro-element  conqjutations. 

A-hw  =  Average  time  per  helper  processor  for  waiting,  =  A,2  -  A-tc  -  Xdh-h- 
A,2  =  Average  wall  clock  run  time  per  processor. 
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From  the  results  shown  in  column  10  in  Table  2,  one  can  conclude  that  the  overall  run  times 
are  fairly  scaled  with  the  nmnber  of  processors.  The  computation  times  logged  by  the  domain 
and  helper  processors,  columns  6  and  8,  show  that  the  micro-element  computations  consume  a 
significant  amount  of  time. 

The  practical  benefit  achieved  by  using  the  helper  processors  can  be  determined  by 
considering  the  overall  run  times.  When  the  wall  clock  run  times  in  the  last  two  colirams  of 
Tables  1  and  2  are  considered  solely  in  terms  of  the  number  of  domain  processors  available  (as 
shown  in  Figure  7),  then  the  reductions  in  the  wall  clock  run  time  are  significant  with  the  helper 
processors.  However,  when  the  same  times  are  examined  by  considering  that  one  could  have 
used  helper  processors  as  domain  processors,  then  it  must  be  concluded  that  the  helper 
processors  are  not  utilized  in  an  efficient  manner.  The  run  times  with  helper  processors  show  an 
average  increase  of  35%,  but  such  cursory  examination  is  misleading  because  the  wall  clock  run 
times  included  the  wait  times.  These  times  are  significant  as  shown  for  the  domain  processors  in 
colunm  4  in  Table  1  and  column  4  in  Table  2  and  for  the  helper  processors  in  column  9  in 
Table  2.  Since  these  times  can  be  eliminated  with  suitable  input  data  set  decomposition,  the  wall 
clock  run  times  should  be  compared  without  these  wait  times.  When  this  is  done  as  shown  in 
Table  3,  the  penalty  reduces  to  an  average  of  9%.  The  results  for  the  penalty  are  plotted  against 
the  total  number  of  processors  in  Figure  8. 
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Figure  7.  Total  Run  Times  for  the  Two  Strategies. 
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Table  3.  Effective  Run  Times  in  the  Two  Strategies 


No.  of  Total 
Processors 

St 

rategy  2 

Strategy  1 

Penalty 

■M 

^2 

(S) 

Effective 
Rim  Time 
(s) 

mSM 

(s) 

Effective 
Run  Time 
(s) 

4 

20,04 

3,864 

15,930 

10,062 

8,532 

1.18 

8 

1,975 

1,717 

4,660 

1,774 

ifeiiriw 

3,927 

1.19 

16 

1,458 

778 

4,385 

mmm 

1.15 

32 

875 

380 

2,271 

1,515 

950 

1.07 

64 

462 

214 

1,191 

515 

336 

857 

521 

0.99 

No.  of  Processors 

Figure  8.  Total  Run  Times  for  the  Two  Strategies  vs.  Total  Number  of  Processors. 

In  addition  to  the  wait  times,  there  is  a  potential  for  reducing  the  communication  time 
between  the  domain  and  helper  processors  (i.e.,  the  times  shown  in  colmnns  5  and  7  of  Table  2). 
The  input  to  and  the  output  from  the  micro-element  computations  were  not  minimized  and 
packed  with  any  due  consideration.  When  this  is  done,  the  above  penalty  will  reduce  even  further 
perhaps  showing  real  gains  for  larger  size  microstructure  idealizations.  These  and  the  results 
from  the  third  strategy  are  in  progress  now. 
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5.  Concluding  Remarks 


In  this  report,  scalable  implementation  of  nonlinear  explicit  AEH  for  solving  coupled  micro 
and  macro  structural  applications  is  presented.  Scalability  of  the  approach  is  demonstrated  for  a 
standard  Taylor’s  impact  test  problem.  It  provides  some  data  for  using  the  helper  processors  for 
micro-element  computations  in  dual-level  finite  element  modeling  with  embedded  micro 
structural  models.  The  following  conclusions  can  be  identified  from  the  present  study. 

•  Penalty,  which  is  caused  by  the  additional  commimication  between  domain  and  helper 
processors,  reduces  as  the  munber  of  processors  is  increased. 

•  In  any  given  time  step,  computations  in  some  domains  may  finish  ahead  of  other 
domains.  Processors  of  these  domains  may  be  used  to  help  out  the  still  ongoing 
computations  at  other  domains.  In  the  example  problem,  a  greater  number  of  processors 
become  available  for  computing  the  plastic  elements  if  the  number  of  domains  is 
increased. 

•  The  benefit  of  having  additional  processors  for  computation  may  outweigh  the 
communication  penalty  as  the  size  of  the  micro-structural  model  is  increased. 

•  An  understanding  of  micro-material  model  details  to  be  communicated  vs.  micro  material 
model  computation  is  needed  for  material  modelers. 
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Glossary 


Unless  stated  otherwise,  all  indices  vary  according  to  dimensionality  of  the  problem  (e.g.,  i 
2,  3  in  a  three-dimensional  structure). 
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Qij 
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Sij 

sub-subscript  n 
superscript  s 
superscript  J 
superscript  n 
t 

Ti 

Ui 

Vi 

Xi 

Yi 


effective  plastic  strain 
a  corrector 

first  elasto-plastic  corrector 
second  elasto-plastic  corrector 
matrix  quantity 
matrix  transpose 
vector  quantity 

a  fourth  order  material  property  tensor 
a  fluidity  parameter 
a  strain  vector 

summed  internal  pre-stress  force  vector 
summed  external  force  vector 
summed  body  force  vector 
a  body  force  vector 
shear  modulus 
a  hardening  parameter 
bulk  modulus 
mass  matrix 

number  of  helper  processors 
a  normal  vector 

number  of  micro-elements  in  a  macro-element 
pressure 

a  fluidity  parameter 
a  normal  vector  to  the  yield  surface 
rate  dependent  radius  of  the  yield  surface 
a  deviatoric  stress  tensor 

used  for  Ui,  Vi,  Oi,  and  ei  to  denote  Y-periodic  functions 

micro/macro  continuum  solutions 

Jaumann  definition  of  a  co-rotated  quantity 

values  at  the  nth  time  step 

time 

a  surface  traction  vector 
a  displacement  vector 
a  velocity  vector 
a  position  vector 
a  position  vector 
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X(Jc 

^dd 

^dh-d 

^dh-h 


^dw 

^hc 

^hw 
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CTy 

©kj 
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a  scalar  quantity  representing  the  magnitude  of  the  radial  return  correction 
a  time  derivative 
an  internal  energy 

surface  of  a  boundary  on  which  displacement  conditions  are  prescribed 
surface  of  a  boundary  on  which  traction  conditions  are  prescribed 
a  domain 
Kronecker  delta 
a  small  positive  number 
a  Y-periodic  function 

a  scalar  quantity  used  in  the  time  integration 

run  time  with  only  domain  decomposition  and  no  helper  processors 

run  time  with  only  domain  decomposition  and  helper  processors 

average  computation  time  on  domain  processors 

average  communication  time  between  domain  processors 

average  time  for  domain  processors  for  communication  with  helper 

processors 

average  time  for  helper  processors  for  commimication  with  domain 
processors 

average  wait  time  on  domain  processors 
average  computation  time  on  helper  processors 
average  wait  time  on  helper  processors 
density 
a  stress  tensor 
static  yield  stress 
a  rotational  vector 
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